Mining enriched contextual information of scientific collaboration: A meso perspective
نویسندگان
چکیده
Studying scientific collaboration using coauthorship networks has attracted much attention in recent years. How and in what context two authors collaborate remain among the major. Previous studies, however, have focused on either exploring the global topology of coauthorship networks (macro perspective) or ranking the impact of individual authors (micro perspective). Neither of them has provided information on the context of the collaboration between two specific authors, which may potentially imply rich socioeconomic, disciplinary, and institutional information on collaboration. Different from the macro-perspective and micro-perspective, this paper proposes a novel method (meso perspective) to analyze scientific collaboration, in which a contextual subgraph is extracted as the unit of analysis. A contextual subgraph is defined as a small subgraph of a large-scale coauthorship network that captures relationship and context between two coauthors. This method is applied to the field of library and information science (LIS). Topological properties of all the subgraphs in four time spans are investigated, including size, average degree, clustering coefficient, and network centralization. Results show that contextual subgprahs capture useful contextual information on two authors’ collaboration. Introduction The trend of scientific collaboration has become more and more prominent within and across different disciplines in the past decades. The idea that scientific research is moving from a personal, disciplinarybased, and location-restricted practice towards a collective, problem-oriented and geographicaldistributed activity is well-accepted nowadays (Sonnenwald, 2007). Scientific collaboration advances professional development and increases the integration of knowledge. Access to expertise, facility, and connections from multiple sides is shared in scientific collaboration, providing a stronger whole than any individual side. In particular, it also enhances the visibility of aspiring young scientists (Beaver & Rosen, 1978, 1979). Therefore, in recent years, numerous institutional and governmental initiatives are intended to encourage collaboration among scientists, institutions, and countries. Coauthorship is an explicit and critical product of scientific collaboration, and has been used extensively to explore the patterns and potential of scientific collaboration and the impact of individual scholars. Many aspects of scientific collaboration, including the investigation of global topology of coauthorship networks or ranking the impact of individual authors, can be tracked by analyzing the coauthorship network. However, some questions, such as in what context do two specific coauthors actually collaborate and why, are still remained unanswered. Previous studies of coauthorship networks can be generally categorized into two directions. One focuses on the global structure and evolution of coauthorship networks (a macro perspective) (Barabasi, et al., 2002; Moody, 2004; Newman, 2001a, 2001b; Leydesdorff & Wagner, 2008). The other emphasizes various indicators of the impact/prestige of individual researchers (a micro perspective). For example, different types of centrality and weighted PageRank are performed based on coauthorship networks (Borner, et al., 2005; Liu, et al., 2005; Yan & Ding, 2009; Yan et al., 2010). Although Integration of coauthorship networks on institutional and international levels can noticeably reflect the language and geographical factor in scientific collaboration, yet it loses some information and makes the personal-level factors invisible. They haven’t shed light on how to characterize and contextualize the collaborative relationship of a coauthor pair. In large-scale coauthorship networks, searching for the relationship between two specific coauthors (i.e., two people who have co-published a paper, which is referred to as “a coauthor pair” in the rest of the paper), usually yields the edge between them weighted by the number of coauthored papers (Figure 1 (a)). However, such edges disregard a large amount of contextual information between the co-author pair. First, a direct concern is that a single edge in coauthorship networks omits important information in the case of more than two researchers collaborating on one paper (i.e., multi-authorship). Second, a single edge in coauthorship networks cannot display the broad environment of collaboration, such as disciplinary, socioeconomic, institutional, and geographical factors (Sonnenwald, 2007). Instead, relevant authors, who are directly or indirectly involved in the collaboration of the coauthor pair, can imply rich contextual information. The subgraph formed by these relevant authors and the coauthor pair is referred to as the contextual subgraph characterized by the co-author pair (a meso perspective). Table 1 summarizes the macro, micro and meso perspectives of coauthorship networks. Table 1 A summary of macro, micro and meso level perspectives Measure Characteristics Macro-level size, largest component, geodesic distance, degree distribution, clustering coefficient, kcore, and so forth detecting the global pattern of scientific collaboration Micro-level degree centrality, closeness centrality, betweeness centrality, eigenvector centrality, and PageRank identifying most collective authors and ranking impact of individual authors Meso-level number, size, and other topological properties of contextual subgraphs characterizing and contextualizing collaborative relationships between coauthor pairs In order to address those questions, this paper defines a contextual subgraph that captures the link and context information for a coauthor pair. More specifically, the research questions addressed in this paper is “in what context do two specific coauthors actually collaborate and why”. Here we provide an example of contextual subgraphs. As shown in Figure 1, assume that people want to find out how and in what context M. Thelwall and D. Wilkinson have collaborated. By looking at the edge between the two scientists (Figure 1a), people can only know that they have coauthored a certain number of papers. By contrast, the contextual subgraph (Figure 1b) shows seven more researchers involved in their collaboration. Through examining those researchers’ affiliations, we found that M. Thelwall and D. Wilkinson are both faculty members of Statistical Cybermetrics Research Group at University of Wolverhampton, as are R. Binns, L. Price, and P. Musgrove. In addition, G. Harries, X.M. Li, and T. Page-Kennedy are faculty members in the same department with M. Thelwall and D. Wilkinson. Many papers coauthored by M. Thelwall and D. Wilkinson also involved those other nodes in the subgraph as coauthors (multiple authors). Contextual information supplied by the subgraph is more informative in helping us to understand these collaborations. Figure 1 (a) The edge between M. Thelwall and D. Wilkinson in coauthorship network; and 1 (b) the contextual subgraph of M. Thelwall and D. Wilkinson Taking contextual graph as the unit of analysis, statistical features of the topological properties of subgraphs can be investigated and correlated to other aspects of coauthorship (demographics, journal, institution, nations, mentorship, etc.) to uncover underlying mechanisms of scientific collaboration. In this paper, the method is applied to the coauthorship networks in LIS field. In addition to diachronically analyzing the topological properties of thousands of coauthor subgraphs, this paper also explores how topological properties of contextual subgraphs correlate with productivities and citations of coauthor pairs. This paper is organized as follows: section 2 states related works; section 3 elaborates on the methodology and the sample data; section 4 presents the results; and section 5 concludes the study. Related Work Before 2000, studies of coauthorship networks focused on the validity of using coauthorship data to analyze research collaboration and how coauthorship can be retrieved, refined, and analyzed (Lukkonen et al., 1992; Kretschmer, 1994; Persson & Beckmann, 1995; Melin & Persson, 1996). The coauthorship networks in these studies are usually of relatively small size. Beginning in 2000, several researchers a) b) started to construct large-scale networks using coauthorship data representing research collaborations in various disciplines (Newman, 2001a, 2001b, 2001d, 2004; Barabási et al., 2002; Newman, 2004; Moody, 2004). Topological properties of networks that have been much discussed include graph size, largest components, geodesic distance, degree distribution, clustering coefficient, centrality, and k-core. While Newman (2001a, 2001b) performed analysis on a static network at a specific time point, Barabási et al. (2002) presented the evolution of topological properties of coauthorship networks in mathematics and neuroscience for an eight-year period (1991-98) and built a model to simulate the structural mechanisms that govern the evolution. Moody (2004) explored how variations of the global network topology in sociology collaboration networks have affected the field’s research practice in the last 30 years. Another direction of studies aimed to construct various indicators of the impact of individual authors/institutions/countries through manipulation of coauthorship network properties from a micro perspective (Borner et al., 2005; Liu et al., 2005; Yan & Ding, 2009). Assorted measurements of centrality and adapted models of PageRank are two popular topics of such studies. Borner et al. (2005) proposed a novel local, author-centered measure based on the entropy contribution of a single author’s impact across all of its coauthorship relations. Yan and Ding (2009) compared authors’ impact ranked by PageRank and various centrality measures over a time span of 20 years and verified their usability. Liu et al. (2005) proposed a weighted PageRank algorithm which takes the number of papers coauthored into consideration. Few of those studies, however, have analyzed the relationship and context between a coauthor pair from a meso perspective. As proposed in this paper, a subgraph that captures important connections between two-coauthors can fill this gap. Meanwhile, extensive literatures have been devoted to study the internal and external factors that affect scientific collaboration. They have emphasized different aspects of scientific collaboration, including: (1) Cognitive/disciplinary factor; for example, the emerging interdisciplinary areas require collaboration, etc. (Katz & Martin, 1997; Beaver, 2001; Hara, Solomon, Kim, & Sonnenwald, 2003); (2) Geographic factor; for example, researchers who are geographically closer are more likely to collaborate (Katz, 1994; Luukkonen et al., 1992;Schubert & Braun, 1990); (3) Organizational factor; for example, leadership and management of scientific collaboration also play a noticeable role (Finholt & Olson, 1997); (4) Political factor; for example, governments are keen to encourage the level of participation in scientific collaboration (Clarke, 1967; Smith, 1958); (5) Socioeconomic factor (Maglaughlin & Sonnenwald, 2005); (6) Resource accessibility (Cohen, 2000); and (7) Social networks and personal factors; prestige and productivity of researchers also impact their participation in scientific collaboration (Egghe, 2008; Glänzel, 2000; Glänzel & Schubert, 2001). However, most of the previous studies have either analyzed various possible factors theoretically and qualitatively, or verified only an individual factor with quantitative evidences. There lacks a method that can be used to quantitatively analyze all possible factors on a unified platform. In fact, all those factors that affect the scientific collaboration are buried in the background information (e.g., nationality, affiliation, position, expertise, prestige, etc.) of coauthors in the identified contextual subgraph. Another group of related work addresses graph mining. Subgraph extraction and matching is an emerging topic in the area of graph mining. Estrada et al. (2005) defined a novel centrality measure, referred to as subgraph centrality, which characterizes nodes in a network according to the set of subgraphs formed by random walks starting and ending at the node (i.e., closed walk). The influence of closed walks on the centrality decreases as the length of the walk increases. Their experiments showed that subgraph centrality is more discriminative for the nodes of a network than degree, betweenness, closeness, or eigenvector centrality. Faloutsos et al. (2004) extracted a subgraph that best captures the relationship between two nodes based on a large graph, using an electricity circuit analogue. Their algorithm was adapted and applied by Ramakrishnan et al. (2005) to multi-relational graphs. Another study utilized subgraphs in measuring proximity between nodes in graphs (Koren et al., 2006). Work of Faloutsos et al. (2004) extended the definition of subgraph to identifying the most important set of intermediate nodes among more than two predefined nodes (Tong & Faloutsos, 2006). While these studies emphasized similarity between indirectly connected pair of nodes, this paper concentrates on contextualizing pairs of authors who are directly connected in a coauthorship network. On the other hand, those studies address the problem of the subgraph from an algorithm perspective, while this study tailors the problem according to specific features of coauthorship networks and shows rich possibilities of exploring scientific collaboration using contextual subgraphs proposed in this paper. Methodology The contextual subgraph between a coauthor pair is defined as a subgraph of the large coauthorship graph that is formed by paths within a certain length between two directly connected authors (i.e., a coauthor pair). A contextual subgraph is thus characterized or defined exclusively by a coauthor pair. The contextual subgraph of a coauthor is created through two steps: 1) identify all the paths within a certain length of the coauthor pair; and 2) merge those paths into a graph. Algorithm A modified heap-based Dijkstra path-finding algorithm is used to efficiently identify the paths within a certain length between two specific nodes in a large-scale graph (Tang et al., 2008). Length denotes the number of jumps needed to reach from one node to another in the undirected coauthorship network. Identified paths are further merged to form the contextual subgraph. More specifically, the approach contains two steps: 1. Enumeration of all paths within a certain length (predefined threshold): a heap-based Dijkstra algorithm with complexity of O(nlogn) (n is the size of the original graph) and a depth-first search are used to locate all the paths within a certain length between two specific nodes. Intuitively, search processes begin at the starting node and ending note at the same time. The process systematically explores all the neighboring nodes in sequence, where for each of those neighboring nodes, it visits their unexplored neighbor nodes and records/updates all its stretchingout paths. One path is identified when the two processes visit the same node. Thus the path is recognized by combining the recorded paths between the staring node and the coincidental node, and between the coincidental node and the ending node (see Figure 2). In Figure 2, supposing the starting node is 1 and the ending node is 26: • Breadth first search (BFS)explores the nearest neighbor of node 1 and reaches node 3, 4, 6, 7, 10 (Figure 2-b); • Meanwhile, another BFS similarly explores the nearest neighbor of node 26 and it reaches node 19, 21, 23, 24, 25 (Figure 2 c); • The former BFS further explore all the nearest neighbors of node 3, 4, 6, 7, 10, and reaches 2, 5, 8, 9, 11, 14, 18 (Figure 2 d); • Meanwhile, the latter BFS explore all the nearest neighbors of node 19, 21, 22, 23, 24, 25, and it reaches 15, 16, 18, 22 (Figure 2 e); and • A node (i.e., node 18) is visited by both BFS processes; the algorithm ends. The shortest path between node 1 and node 26 is 1 – 10 – 18 – 21 – 26 (Figure 2 f).
منابع مشابه
Mining diversity subgraph in multidisciplinary scientific collaboration networks: A meso perspective
This paper proposes a framework to analyze the interdisciplinary collaboration in a coauthorship network from a meso perspective using topic modeling: (1) a customized topic model is developed to capture and formalize the interdisciplinary feature; and (2) the two algorithms Diversity Subgraph Extraction (DSE) and Constraint-based Diversity Subgraph Extraction (CDSE) are designed and implemente...
متن کاملAnalysis of human mobility patterns from GPS trajectories and contextual information
Human mobility is important for understanding the evolution of size and structure of urban areas, the spatial distribution of facilities, and the provision of transportation services. Until recently, exploring human mobility in detail was challenging because data collection methods consisted of cumbersome manual travel surveys, space-time diaries or interviews. The development of location-aware...
متن کاملمصورسازی شبکه همکاری علمی پژوهشگران فصلنامه مدیریت سلامت با رویکرد علم سنجی: 1392 تا 1396
Introduction: Reputable academic journals are known as the driving force behind the development of societies. In the meantime, scientific communication is considered as the most important and effective tool for the development of science and knowledge in the information society. The Journal of Health Administration is the second most popular journal in Iran, due to its impact factor in the fiel...
متن کاملThe role of joint collaboration, family perspectives and support networks for students with visual impairment
Abstract Background and Aim: Cooperation and participation for the progress and success of students with visual impairment has different dimensions and is of particular importance. Joint collaboration is an agreement and process of working together to achieve a mutual goal. Every learner is strongly influenced by the social context in which he lives. This study aimed to investigate joint co...
متن کاملTarget-Independent Mining for Scientific Data: Capturing Transients and Trends for Phenomena Mining
This paper describes a data mining approach for extracting enriched data from scientific data archives such as NASA’s Earth Observing System Data and Information System (EOSDIS) that are stored on slow access tertiary storage. This enriched data has significantly smaller volume than the original data, yet preserves sufficient properties of this data such that over time, many different users can...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 62 شماره
صفحات -
تاریخ انتشار 2011